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^Abstract  '  J 

^Consider  a  storage  system,  such  as  an  inventory  or  cash  fund, 
whose  content  fluctuates  as  a  (p,  a  )  Brownian  motion  In  the  absence 
of  control.  Holding  costs  are  continuously  incurred  at  a  rate  propor¬ 
tional  to  the  storage  level,  and  we  may  cause  the  storage  level  to 
jump  by  any  desired  amount  at  any  time  except  that  the  content  must  be 
kept  nonnegative.  Both  positive  and  negative  jumps  entail  fixed  plus 
proportional  costs,  and  our  objective  is  to  minimize  expected 
discounted  costs  over  an  Infinite  planning  horizon.  A  control  band 
policy  is  one  that  enforces  an  upward  jump  to  q  whenever  level  zero 
Is  hit,  and  enforces  a  downward  jump  to  Q  whenever  level  S  is  hit 
(0  <  q  <  Q  <  S).  We  prove  the  existence  of  an  optimal  control  band 
policy  and  calculate  explicitly  the  optimal  values  of  the  critical 
numbers  (q,Q,S).  - 
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IMPULSE  CONTROL  OF  BROWNIAN  MOTION 


J.  Michael  Harrison,  Stanford  University 
Thomas  M.  Sellke,  Stanford  University 
Allison  J.  Taylor,  Queen’s  University 


1.  Introduction  and  Summary 

Consider  a  controller  who  continuously  monitors  the  content,  or 
state,  of  a  storage  system.  In  the  absence  of  control,  the  content 
process  Z  -  {Zt,  t  ^  0}  fluctuates  as  a  Brownian  motion  with  drift 
p  and  variance  a-  The  controller  can  at  any  time  Increase  or 
decrease  the  content  of  the  system  by  any  amount  desired,  but  he  Is 
obliged  to  keep  Zt  0,  and  there  are  three  types  of  cost  to  be 
considered. 

(1.1)  In  order  to  effect  an  Increase  from  level  x  to  level  x+6, 
the  controller  must  pay  a  fixed  charge  K  plus  a  proportional 
charge  k6 . 

(1.2)  Similarly,  It  costs  L+X6  to  effect  a  decrease  from  level  x 
to  level  x-6. 

(1.3)  Inventory  holding  costs  are  continuously  Incurred  at  rate 
hZt . 
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Thus  we  have  linear  holding  costs  and  fixed  plus  proportional 
costs  of  control.  We  seek  a  policy  that  will  minimize,  subject  to  the 
constraint  Zt  !>  0,  the  expected  present  value  of  holding  costs  and 
control  costs  incurred  over  an  infinite  planning  horizon,  where  future 
costs  are  continuously  discounted  at  interest  rate  y  >  0. 

For  a  concrete  application,  one  may  consider  the  so-called 
stochastic  cash  management  problem.  Here  Zt  represents  the  con¬ 
tent  at  time  t  of  a  cash  fund,  into  which  a  certain  amount  of  Income 
or  revenue  is  automatically  channelled  and  out  of  which  operating  dis¬ 
bursements  are  made.  Interpret  h  as  the  opportunity  loss  rate  for 
cash  held  within  the  fund,  meaning  that  h  is  the  amount  of  income 
per  period  that  could  have  been  earned  by  a  dollar  of  cash  if  it  had 
been  Invested  in  securities.  When  the  content  of  the  cash  fund  gets 
too  large,  the  controller  may  choose  to  convert  some  of  his  cash  into 
securities,  and  for  this  he  pays  a  fixed  transaction  cost  K  plus  a 
proportional  cost  of  k  times  the  transaction  size.  On  the  other 
hand,  he  may  at  any  time  convert  securities  into  cash,  this  too 
involving  fixed  plus  proportional  transaction  costs. 

It  is  more  or  less  obvious  that  there  exists  for  this  problem  an 
optimal  policy  of  the  type  pictured  in  Figure  1.  Using  the  language 
of  the  stochastic  cash  balance  problem,  this  control  band  policy  may 
be  described  as  follows.  First,  it  is  characterized  by  three 
parameters  (q,  Q,  S)  satisfying  0  <  q  <  Q  <  S,  and  for  future 
reference  we  define 
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Assuming  that  K  and  L  are  strictly  positive,  as  we  shall  do 
throughout,  our  problem  is  one  of  impulse  control  [4].  This  means 
that  the  controller  exerts  his  influence  through  lump  sum  displace¬ 
ments  effected  at  isolated  points  in  time.  The  impulse  control 
problem  is  quite  easy  to  formulate  in  precise  mathematical  terms,  and 
we  shall  do  this  shortly.  Harrison  and  Taylor  [6]  have  studied  the 
analagous  problem  with  proportional  control  costs  only  (K  *  L  *  0), 
which  requires  a  more  subtle  formulation  but  is  easier  to  solve 
explicitly.  With  this  cost  structure,  it  was  shown  that  the  optimal 
control  policy  enforces  an  upper  reflecting  barrier  at  Q  and  a  lower 
reflecting  barrier  at  zero,  where  Q  is  the  unique  solution  of  a 
certain  transcendental  equation.  Roughly  speaking,  this  barrier 
policy  is  the  limit,  as  s  and  q  both  approach  zero,  of  this 
control  band  policy  pictured  in  Figure  1.  The  controller  exerts 
influence  at  uncountably  many  time  points,  but  the  total  amount  of 
upward  or  downward  displacement  effected  during  any  finite  period  is 
finite.  The  controller  can  obviously  effect  instantaneous  state 
changes  in  this  problem,  so  the  state  constraint  Zt  0  makes 
sense,  and  yet  policies  cannot  be  described  through  a  discrete 
sequence  of  intervention  times.  Harrison  and  Taksar  [7]  have  coined 
the  term  instantaneous  control  to  describe  that  state  of  affairs,  and 
the  interested  reader  may  see  [7]  and  [3]  for  analyses  of  other  such 
problems . 

Harrison  and  Taylor  [6]  also  considered  the  case  where  K  >  0 
and  L  ■  0,  obtaining  an  optimal  policy  that  imposes  an  upper  reflect¬ 
ing  barrier  at  S  and  enforces  an  upward  jump  to  q  whenever  level 
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zero  Is  hit  (0  <  q  <  S).  Such  a  policy  can  be  obtained  by  letting 
s  -*■  0  in  the  control  band  policy  of  Figure  1.  Finally, 

Constantlnides  and  Richard  [5]  have  studied  a  Brownian  impulse  control 
problem  more  general  than  ours.  (To  be  more  precise,  our  problem  can 
be  obtained  by  letting  a  certain  cost  parameter  approach  <»  in  their 
formulation.)  They  prove  the  existence  of  a  structured  optimal  policy 
but  do  not  show  how  to  compute  its  critical  numbers  except  for  certain 
simple  special  cases. 

In  this  paper  we  show  that  an  optimal  control  band  policy  exists 
for  the  impulse  control  problem,  and  we  determine  explicitly  the 
optimal  policy  parameters  (q,  Q,  S).  The  optimal  policies  of 
Harrison  and  Taylor  [6]  can  be  obtained  by  letting  one  or  both  of  the 
fixed  control  costs  approach  zero  in  our  formulas.  In  addition,  our 
mathematical  development  is  cleaner  and  more  nearly  self-contained 
than  that  in  [6],  and  we  give  better  economic  and  probabilistic 
interpretations  for  our  results.  To  briefly  summarize  those  results, 
let  us  first  define 

(1.4)  c  “  h/y  +  k  and  r  -  h/y  -  l  . 

It  will  be  shown  that  the  original  problem  is  completely  equivalent  to 
another  impulse  control  problem  with  the  following  cost  structure. 

(1.5)  When  an  upward  jump  of  size  5  is  effected,  the  controller 
incurs  a  fixed  cost  K  plus  a  proportional  cost  c6« 
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(1.6)  When  a  downward  jump  of  size  6  is  effected,  the  controller 
Incurs  a  fixed  cost  L  but  earns  a  proportional  reward  r6. 

\ 

(1.7)  There  are  no  holding  costs. 

To  understand  this  equivalence,  note  first  that  h/y  is  the  dis¬ 
counted  cost  of  holding  one  unit  of  stock  in  inventory  forever.  Under 
the  cost  structure  (1.5)-(1.7),  our  controller  is  charged  this  full 
infinite-horizon  holding  cost  for  each  unit  of  stock  that  he  intro¬ 
duces  into  the  system,  he  is  credited  with  a  refund  of  equal  size  each 
time  he  removes  a  unit  of  stock  from  the  system,  and  no  holding  costs 
are  incurred  in  the  Interim.  Except  for  certain  uncontrollable  terms, 
this  cos t  structure  is  found  to  be  identical  to  the  original  one, 
where  holding  costs  are  charged  continuously  according  to  the  current 
stock  on  hand.  To  avoid  uninteresting  degeneracies,  we  assume 
throughout  that 

(1.8)  0  <  r  <  c  <  ®  . 

For  any  choice  of  policy  parameters  satisfying  0  <  q  <  Q  <  S, 
there  exists  a  unique  function  x  on  [0,S]  satisfying 

(1.9)  i-  a*  x"(x)  +  px'(x)  -  yx(x)  ■  0  ,  0  <  x  <  S  , 
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(1.10) 


x(Q)  -  x(S)  -  r 


and  one  can  furthermore  write  out  an  explicit  and  relatively  simple 
formula  for  x  in  terms  of  the  policy  parameters  Q  and  S.  The 
function  x  is  strictly  convex,  with  a  minimum  between  Q  and  S, 
and  there  is  exactly  one  choice  of  the  policy  parameters  (q,  Q,  S) 
such  that 

S 

(1.11)  /  [r-x(x)]dx  -  L  , 

Q 

(1.12)  ic(q)  -  c  , 
and 

(1.13)  jq  [x(x)-c]dx  -  K  , 

0 

as  depicted  in  Figure  2  below.  These  are  the  parameters  of  the 
optimal  control  band  policy,  and  the  associated  function  x  is  the 
derivative  of  the  optimal  value  function.  It  will  be  shown  that 
(1.11)  alone  determines  s  =  S-Q,  after  which  (1.12)  determines 
A  =  Q-q,  and  then  (1.13)  determines  q.  This  three-step  algorithm  for 
determination  of  the  optimal  parameters  will  be  written  out  in 
algebraic  form,  and  interpretations  of  the  conditions  (1.11)-(1.13) 
will  be  given. 
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H(x) 


Figure  2:  Optimal  Policy  Parameters 

The  paper  is  organized  as  follows.  In  §2  we  give  a  precise 
formulation  of  our  impulse  control  problem,  prove  equivalence  of  the 
cost  structures  (1.1)-(1.3)  and  (1.5)-(1.7),  and  lay  out  some  other 
preliminary  propositions.  Section  3  is  devoted  to  characterization  of 
control  band  policies.  In  §4  we  show  that  there  exists  a  unique  set 
of  policy  parameters  (q,Q,S)  satisfying  a  certain  set  of  conditions, 
and  we  rigorously  prove  the  optimality  of  the  corresponding  control 
band  policy.  Finally,  §5  develops  interpretations  for  the  optimality 
conditions  taken  as  primitive  in  §4. 
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2.  Problem  Formulation  and  Preliminaries 


I 


The  data  for  our  problem  are  a  drift  parameter  p,  a  variance 
parameter  a  >  0,  fixed  control  costs  K  >  0  and  L  >  0, 
proportional  control  cost  rates  k  and  A,  a  holding  cost  rate  h, 
and  an  Interest  rate  y  >  0.  Defining  c  *  h/y  +  k  and  r  *  h/y  -  A, 
we  assume  throughout  that  0  <  r  <  c. 

Let  Q  be  the  space  of  all  continuous  functions  u  :  [0,®)  -*■  R 
(the  real  line).  For  t  >_  0  let  Xt  :  Q  >  R  be  the  coordinate 
projection  map  Xt(a>)  ■  uj(t).  Then  X  *  (Xt,  t  ^  0)  is  simply 
the  identity  map  Q  ■*  Q.  Let  F  ■  o(Xt,  t  ^  0)  denote  the  smallest 
onfield  such  that  Xt  is  F-measurable  for  each  t  _>  0,  and 
similarly  let  Ft  *  o(X8,  0  s  jC  t)  for  t  >  0.  Hereafer,  when 
we  speak  of  adapted  processes  and  stopping  times,  the  underlying 
information  structure  (filtration)  is  understood  to  be  (Ft,  t 
0).  Finally,  for  each  x  c  R  let  Px  be  the  unique  probability 
measure  on  (Q,  F)  such  that  X  is  a  Brownian  motion  with  drift  p, 
variance  a  and  starting  state  x  under  Px.  Let  Ex  be  the 
associated  expectation  operator. 

A  policy  consists  of  a  sequence  of  stopping  times  (Tq,  T^,  ...} 
and  a  sequence  of  random  variables  (£q,  ...}  such  that 

(2.1)  P  (0  -  Tn  <  T,  <•••-*■  ®)  -  1  ,  for  all  x  c  R  , 

x  0  1 

(2.2)  5  c  F_  ,  for  all  n  ■  0,  1,  ...  . 

"tl  T 
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Interpret  Tn  as  the  n6*1  time  at  which  the  controller  enforces  a 
jump  in  the  state  of  the  system,  with  £n  the  size  of  the  jump 
(either  positive  or  negative)  enforced.  The  convention  Tq  *  0 
will  prove  to  be  convenient,  but  then  we  must  of  course  allow  ■  0. 
We  associate  with  a  policy  {(Tn,  £n)}  the  processes 

N(t)  *  sup{n  >.0:Tn£t},  t  ^  0  > 

Yt  ■  +  —  +  W  'i0' 

Zt  ■  *t  +  Yt  •  t  >  0  . 

(The  time  parameter  of  a  given  process  may  be  written  either  as  a 
subscript  or  as  a  functional  argument,  depending  on  which  is  more 
convenient.)  Note  that  N,  Y  and  Z  are  all  adapted  and  right 
continuous  with  left  limits.  The  policy  {(Tn»Cn^  ls  8aid  to  '>e 
feasible  if 


(2.3) 

P  (Z  >0  for  all  t 
x  t  — 

>  0)  - 

1  ,  for  all  x  c  R  , 

(2.4) 

M  l  <1+l«nl>  '  ^ 
n  *0 

)  <  ®  » 

for  all  x  c  R  • 

Setting 

(  K  +  kC  , 

if 

s  >  o  , 

(2.5) 

& 

vfl 

w 

R 

O 

w 

if 

c  -  o  , 

L  -  Jl£  ,  if  t,  <  0  , 


we  define  the  cost  function  for  a  feasible  policy  {(Tn>  £n)}  by 
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(2.6) 


00 


C(x)  -  E  [h  /  e‘Yt 
x  0 


dt  + 


I 

n-0 


-rr. 


for  all  x  e  R.  From  (2.4)  It  follows  that  C(x)  is  both  well 
defined  and  finite  for  all  x  e  R.  We  say  that  this  policy  Is  optimal 
If  It  minimizes  C(x),  over  all  feasible  policies,  for  each  x  e  R. 

Now  let 

!-K  -  c£  ,  If  K  >  0  , 

0  ,  If  5*0, 

-L  -  r£  ,  If  C  <  0 

so  that  <K£)  ■  -$(£)  “  (b/y)C*  For  each  feasible  policy  {(T  ,F  )} 

n  n 

define  the  value  function 

(2.8)  V(x)  -  E  {  J  e  YT"  <p(r  )}  ,  x  e  R  . 

n«0 


Obviously  C(x)  is  the  expected  present  value  of  total  costs,  start¬ 
ing  in  state  x,  under  our  original  cost  structure  (1.1)-(1.3),  while 
V(x)  Is  the  expected  present  value  of  net  rewards  under  the  alternate 
cost  structure  (1.5)-(1.7). 


(2.9)  Proposition:  For  each  feasible  policy  ((Tn,£n)}  we  have 
C(x)  ■  hx/y  +  h p/y2  -  y(x)  for  all  x  c  R.  Thus  a  feasible  policy 
Is  optimal  if  and  only  if  It  maximizes  V(x),  over  all  feasible 
policies,  for  each  x  e  R. 
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Remark:  Hereafter  we  shall  deal  exclusively  with  the  equivalent 
maximization  problem.  This  equivalence  was  used  in  [6]  and  is 
essentially  due  to  Bell  [2]. 


Proof.  From  Fubini's  Theorem  we  have 


W  W 

(2.10)  E  (/  e'YtZ  dt)  -  E  [/  e‘Yt(X  +Y  )dt] 
x  o  t  x  o 


/  e‘Yt(x+nt)dt  +  E  (J  e'Yt  Y  dt) 
0  0 


x/ y  +  \i/y2  +  E  (/  e  Yt  Y  dt) 

x  0  c 


and 


(2.11) 


00  00 


L  v*  •  /  L ‘n 


n-0 


00  00 


I  5n  /  e"  1{T  <t}dt 

n-0  n  0  ui£c' 


r 

n 


-yt 


n-0  n  T 


dt-  i0«.7* 
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Combining  (2.5)-(2.8)  with  (2.10)  and  (2.11)  gives 


-yT 


(2.12)  C(x)  -  HE  {/  Z  dt}  +  E  {  £  e  n  *(£)} 

0  n-0 


~y? 


hx/y  +  hp/y2  +  E  {  l  e  '  f  +  ♦(£_)]} 
n-0  Y  ^  n 


00  -Yt_ 

-  hx/y  +  hp/y  -  Ex{  l  e  <KO} 

n-0 


hx/y  +  hp/y  -  V(x)  . 


(2.13)  Proposition.  Suppose  that  f  :  [0,®)  -►  R  is  continuously 
differentiable,  has  a  bounded  derivative,  and  has  a  continuous  second 
derivative  at  all  but  a  finite  number  of  points.  Then  for  any  T  >  0, 
any  x  c  R  and  any  feasible  policy  we  have 


(2.14) 


Ex[e~YT  f(ZT)]  "  Ex[f(Z0)] 

T  _vT  N(T)  ~Yt 

+  Ex[j  e  Yl(rf-yf)(Zt)dt]  +  Ex[  l  en  e  n]  , 
0  n-1 


where 


9n  s  f(Z(Tn))  -  f(Z(Tn-))  ,  for  n  -  1,  2,  ... 


and 
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(2.15) 


“3 

i 

.1 


rf  -  \  a2 1"  +  Ilf  . 

Remark.  We  may  define  f”(y)  arbitrarily  at  those  points  y  where 
the  second  derivative  does  not  exist,  becasue  {t^0:Zt*y}  has 
zero  Lebesgue  measure  almost  surely  under  each  Px. 

Proof .  This  is  almost  identical  to  Proposition  (4.2)  of  [7],  so  we 
shall  merely  sketch  the  proof.  Fix  x  e  R,  and  represent  X  in  the 
form  Xt  ■  Xq  +  dWt  +  pt,  where  W  is  a  standard  (zero  drift  and  unit 
variance)  Brownian  motion  (under  Px)  with  Wq  *0.  If  f  is 
twice  continuously  differentiable,  then  we  may  apply  the  one- 
dimensional  change  of  variable  formula  (or  generalized  Ito  formula) 
for  semimartingales,  which  appears  on  page  301  of  Meyer  [8],  to  obtain 

T 

(2.16)  f(ZT)  -  f(ZQ)  +  a  f  f’(Zt)  dWt 

T  N(T) 

+  /  rf(z  )dt  +  l  e  . 

0  c  n-l  n 


i 
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In  fact,  this  Is  making  things  a  bit  more  difficult  than  is  really 
necessary,  since  the  same  result  can  be  obtained  by  applying  the 
ordinary  one-dimensional  Ito  formula  over  each  of  the  intervals 
[Tn,Tn+i)  and  then  summing  up  over  n  *  0,  1,  ...,  N(T). 

Furthermore,  it  is  well  known  that  the  Ito  formula  (or  the  general 
change  of  variable  formula)  remains  valid  even  when  f  is  not  twice 
continuously  differentiable,  provided  that  it  has  an  absolutely 
continuous  derivative  f'  and  f"  is  chosen  as  any  density  of  f', 
cf.  [1].  Thus  (2.16)  is  valid  with  our  hypotheses.  Now  using  (2.16) 
and  the  integration  by  parts  formula  for  semimartingales,  which 
appears  on  page  303  of  Meyer  [8],  we  obtain 

T  T 

(2.17)  e‘YT  f(Z  )  -  f(Z_)  +  /  e"Yt  df(Z)  -  /  f(Z,_)  ye‘Yt  dt 

i  u  0  1  0  z 

T  T 

-  f(Zn)  +  a  /  e_Yt  f'(Z  )  dW  +  /  e~Yt  rf(Z  )dt 

0  0 

T 

+  l  e_Yt  Af(Z)  -  /  f(Z  )  ye~Ytdt 
0<t<T  0 

T 

-  f(ZQ)  +  a  /  e  YC  f'(Zt)  dWt 

T  N(T)  -yT 

+  /  e  Y  (rf-yf)(z  )dt  +  l  e  °  0 
0  c  n=l 
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To  gut  the  desired  result,  we  take  Ex  of  both  sides  in  (2.17)  and 
observe  that  the  expectation  of  the  Ito  integral  vanishes,  becasue  its 
integrand  is  bounded  by  hypothesis. 

(2.18)  Proposition.  Suppose  that  f  :  [0,®)  +  R  satisfies  all  the 
hypotheses  of  (2.13)  plus 


(2.19) 

Tf(x)  -  yf (x)  <  0 

for  almost  all 

x  >  0 

> 

(2.20) 

f(x)  >  f(y)-K-(y-x)c 

for 

0  <  x 

<  y  , 

(2.21) 

f(x)  >  f(y)-L+(x-y)r 

for 

0  <  y 

<  X  . 

Then  f(x)  V(x)  for  any  feasible  policy  and  any  x  ^  0. 

Proof .  Using  the  definition  (2.7)  of  (K»),  we  see  that  (2.20)  and 

(2.21)  together  give  us  f(x)-f(y)  <Ky-x),  which  means  that 

(2.22)  -0n  >  <KZ(Tn)-Z(Tn-))  -  4»(£n)  ,  for  n  -  1,  2 . 

where  0n  is  defined  as  in  Proposition  (2.13).  Putting  (2.19)  and 

(2.22)  into  (2.14)  and  rearranging  terms,  we  have 

N(T)  -yT 

Ex(f(Z0)]  >  Ej  l  4,(Cn)  e  n]  +  Ex[e‘Yt  f(ZT)]  . 
n-1 
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From  (2.4)  and  the  boundedness  of  f'  it  follows  that  E^exp^yT) 
f(Zf)]  -*■  0  as  T  +  «,  so  we  have 


(2-23)  Ex[f(Z0)]  >  E  [  l  cKCn)  e  . 

n-1 

Finally,  since  Z^  “  X0+^0*  anotlier  application  of  (2.20)-(2.21)  gives 
(2-24)  f(XQ)  >  f(ZQ)  +  <KC0)  . 

Of  course  Ex[f(Xo)]  ■  f(x),  so  by  combining  (2.23)  and  (2.24)  we 
have  the  desired  result, 

<b  -yT 

f(x)  >  E  [  l  )  e  n]  =  V(x)  . 

n-0 


3.  Control  Band  Policies 

Let  us  now  consider  a  control  band  policy  with  parameters 
(q,Q,S)  satisfying  0  <  q  <  Q  <  S.  Remembering  that  Tq  ■  0  by 


definition,  we  take 

(  <‘xo  * 

if 

0  >  *0  * 

(3.i)  e0  -  <  o  , 

if 

0  <  XQ  <  s  , 

(  s-xo  • 

if 

x0  >  s 

and  of  course  ZQ  -  xq+^o* 

Assuming 

that 

it's  clear  from  the  verbal 

description  given  in  §1  how 

T1’  V 

•  •  • 

and  •••  are 

recursively  constructed,  we  shall  not  write  out  their  formal 
definitions.  The  relevant  properties  of  the  control  band  policy 
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{(Tn,  £„) }  are  the  following.  With  Y  -  {Yt,  t  ^  0}  defined 
as  In  $1,  Z  =  X+Y,  and  s  =  S-Q  as  before,  we  have 


(3.2)  Z(T  -)  £  {0,S} 

n 


for  all  n  ■  1,  2 . 


(3.3) 


if  Z(T  -)  -  0 
n 

• 

If  Z(T  -)  -  S 
n 


We  nos#  want  to  compute  explicitly  the  value  function  V  for  this 
control  bend  policy.  With  the  differential  operator  T  defined  by 
(2. IS*,  it  will  ultimately  be  seen  that  V  is  twice  continuously 
differentiable  on  [0,S]  and  uniquely  satisfies 


(3.4)  rv(x)  -  yv(x)  -  o  ,  o  <  x  <  s  , 

subject  to  the  auxiliary  conditions 

(3.5)  V(0)  -  V(q)  +  <J,(q)  -  V(q)-K-cq  , 

(3.6)  V(S)  -  V(Q)  +  <K-s)  -  V(Q)-L+rs  . 

To  extend  V  to  a  function  on  all  of  R,  we  write  (3.5)  and  (3.6)  in 
the  more  general  form 

(3.7)  V(x)  -  V(q)-K-c(q-x)  ,  for  x  <  0  , 

(3.8)  V(x)  -  V(Q)-L+r(x-Q)  ,  for  x  >  S  . 
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t 


Now  let 

(3.9)  a  =  [(H2  +  2yaV/2  -  t i]/o2  >  0  , 
and 

(3.10)  p  =  [(p2  +  2 yo2)1/2  +  p]/o2  >  0  , 

so  that  z  ■  a  and  z  ■  -p  are  the  two  solutions  of  the  quadratic 
2  2 

equation  1/2  a  z  +  ^z-y  ■  0.  The  general  solution  of  the  ordinary 
differential  equation  (ODE)  (3.4)  Is 

(3.11)  V(x)  ■  A  eaX  +  B  e“pX  ,  0  <  x  <  S  , 

and  In  our  case  the  constants  A  and  B  must  be  chosen  so  as  to 
satisfy  (3.5)  and  (3.6). 

(3.12)  Proposition.  Let  V  be  defined  on  [0,S]  by  (3.11),  with  A 
and  B  chosen  so  as  to  satisfy  (3.5)  and  (3.6),  and  extend  V  to  a 
continuous  function  on  all  of  R  by  (3.7)  and  (3.8).  Then  V  is  the 
value  function  for  the  (q,Q,S)  control  band  policy. 

Proof .  We  shall  use  the  fact  that  this  function  V  satisfies  (3.4), 
which  is  easy  to  verify.  The  central  step  in  the  proof  is  an 
application  of  Proposition  (2.13),  with  V  in  place  of  f.  Since 
0  <  Zt  <  S  for  all  t  0,  it  is  sufficient  for  this  application 
that  V  be  twice  continuously  differentiable  on  [0,S].  With  9n 
defined  as  in  (2.13),  we  have  from  (3.3),  (3.5)  and  (3.6)  that 
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I 


-J- 


!L-rs  ,  if  Z(T  -)  -  S 

n 

, 

K+cq  ,  if  Z(Tn~)  -  0 

which  means  simply  that  0n  ■  ~<J>(£n)*  Furthermore,  rV(Zt) 

-  yV(Zy)  ■  0  for  all  t  0  by  (3.4).  Combining  these  facts  with 

(2.13)  gives 

N(T)  -yT 

(3.14)  Ex[e'Yt  V(ZT)]  -  EX[V(Z0)]  -  E  [  £  e  "  (K^)] 

n»l 

for  any  T  >  0  and  x  c  R.  Next,  (3.1),  (3.7)  and  (3.8)  give 

(3.15)  V(ZQ)  -  V(X04C0)  “  V(X0)-<|,U0)  , 

so  we  can  rewrite  (3.14)  as 

N(T)  -yT 

(3.16)  Ex[e_Yt  V(ZT)]  -  E  [V(X  )]  -  E  [  l  e  n  <KCn>]  • 

n-0 

Of  course  Ex[V(Xq)]  ■  V(x),  and  the  left  side  of  (3.16)  vanishes 
as  T  cd  because  V(Zy)  is  bounded,  so  we  obtain  the  desired 
result  by  letting  T  -►  ®  in  (3.16). 
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4.  Optimal  Policy  Parameters 

Continuing  the  discussion  of  control  band  policies ,  It  will  be 
convenient  to  define 


(4.1)  n(x)  -  V'(x)  ,  for  x  c  R  . 


Actually,  the  left  and  right  derivatives  of  V  need  not  agree  at  x 
and  x  ■  S,  so  (4.1)  Is  ambiguous  at  those  points.  To  resolve  that 
ambiguity,  let  us  agree  to  define  it  so  that  It  Is  continuous  on 
[0,S].  From  (3.11)  we  have 


(4.2)  it(x)  -  ocA  eaX  -pBe^X,  0£x£S, 

and  the  conditions  (3.5)-(3.6)  determining  A  and  B  may  be 
rewritten  in  terms  of  it  as 


(4.3) 


and 


(4.4) 


/  [it(x)-cjdx  -  K  , 
0 


S 

/  [r— it(x)]dx  ■  L  . 

Q 


In  this  section  It  will  be  shown  that  there  is  exactly  one  choice  of 
the  control  band  policy  parameters  (q,Q,S)  such  that 
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(4.5)  n(q)  -  c 

and 

(4.6)  n(Q)  =>  it(S)  -  r  , 

and  the  corresponding  control  band  policy  is  optimal.  Interpretations 
for  (4.5)-(4.6),  and  the  conditions  to  be  derived  from  them  shortly, 
will  be  offered  in  the  next  section.  For  arbitrary  s  >  0  we  define 

(4.7)  a(s)  -  (1  -  e'p8)/(e“8  -  e‘Ps)  >  0  , 

(4.8)  b(s)  -  (eaS  -  l)/(eaS  -  e~pS)  >  0  , 

and 

(4.9)  f8(y)  -  r[a(s)  e°y  +  b(s)  e  Py]  ,  y  e  R  , 

so  that  f8  uniquely  satisfies  rfg— yfB  -  0  subject  to 
f  (0)  -  f  (s)  *  r.  Obviously  f”  >  0  and  thus  we  have  the  following: 

(4.10)  For  any  s  >  0  the  function  fg(»)  is  strictly  convex 
on  R  and  has  a  minimum  in  (0,s). 
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This  situation  is  pictured  in  Figure  3  below.  From  (4.2)  we  have 
rit(x)  -  yn(x)  ■  0  for  0  £  x  £  S,  and  hence  n  can  only  satisfy 
(4.6)  if 

(4.11)  x(x)  -  f  (x-Q)  ,  0  <  x  <  S  , 

s  —  — 

where  s  =  S-Q  as  usual.  Then  (4.4)  demands  that 
S  s 

(4.12)  L  -  J  [r  -  f  (x-Q) ]dx  -  /  [r  -  f  (y)]dy 

Q  0 

■  r.  -  r(—  +  I)  (1  -  -  .'?•) 

a  p 


fs(x) 


Figure  3.  Determining  the  Optimal  Parameters 
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In  §5  we  shall  give  a  probabilistic  Interpretation  for  fs(  •)  that 
makes  the  following  proposition  obvious,  but  It  can  also  be  verified 
analytically,  and  this  is  left  as  an  exercise. 

(4.13)  Proposition.  The  right  side  of  (4.12)  increases  continuously 
from  0  to  ®  as  s  increases  from  0  to  °°,  so  there  is  a  unique 

s  >  0  satisfying  (4.12). 

Hereafter  we  assume  that  s  =  S-Q  has  been  chosen  to  satisfy 
(4.12).  Next,  setting  A  E  Q-q  as  in  §1,  we  see  that  (4.5)  and 
(4.11)  require 

(4.14)  c  -  x(q)  -  fx(“A)  -  r[a(s)  e"“&  +  b(s)  ePA] 

as  shown  In  Figure  3.  It  is  immediate  from  (4.10)  ttuft  there  exists  a 
unique  A  >  0  satisfying  (4.13),  because  c  >  r  by  assumption,  and 
we  assume  hereafter  that  A  has  been  chossr,  tu  this  way.  Finally, 
(4.3)  and  (4.11)  demand  that  Q  >  A  satisfy 

q  -A 

(4.15)  K  -  /  [*<x)  -  c]dx  -  J  [f  (y)-cjdy 

0  -Q  8 

-  r[S£ii  (.-*1  -  raQ)  +  <eW  -  -  c(Q-i)  . 

It  is  again  Immediate  from  (4.10)  that  there  exists  a  unique  Q  >  A 
satisfying  this  condition,  as  shown  in  Figure  3. 
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The  value  of  V(0)  has  been  set  so  that  V(»)»  as  defined  by  (4.18), 
will  satisfy  (rv-r0(0)  ■  0.  Then  (4.18)  Insures  that  (rV-yV)(x)  ■  0 
for  all  x  e  [0,S]  because  x  satisfies  this  same  ODE.  Now  extend 
V  to  a  function  on  all  of  R  by  (3.7)  and  (3.8).  Because  x 
satisfies  (4.3)  and  (4.4)  by  construction,  we  see  that  V  satisfies 
all  the  hypotheses  of  Proposition  (3.12).  Thus  V  is  the  value 
function  for  the  (q,Q,S)  control  band  policy,  as  desired. 

Our  next  task  is  to  show  that  V  satisfies  the  hypotheses  of 
Proposition  (2.18)  and  hence  provides  an  upper  bound  for  the  value 
function  of  any  other  feasible  policy.  First  note  that  V  is 
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continuously  differentiable  on  [0,®)  because  V'(S+)  *  r  by  (3.8), 
while  V'(S-)  *  x(S-)  *  f8(s-)  1  r  by  construction.  Next,  it  must 
be  established  that 


(4.19)  (rV-yV)(x)  <  0  ,  for  all  x  >  0  . 

Of  course  (4.19)  holds  with  equality  on  [0 , S ] .  As  we  pass  through  S 
from  the  left,  both  V  and  V'  ■  it  are  continuous,  while  V“(») 

-  it*  ( * )  jumps  from  the  positive  value  at  S-  pictured  in  Figure  1  to 
a  zero  value.  (Remember  that  V  is  linear  with  slope  r  to  the 
right  of  S.)  Thus  (FV-yV)(S+)  <  0.  Finally,  IV  is  constant  to  the 
right  of  S,  while  V  is  increasing  linearly,  so  IV- yV  becomes  even 
more  negative  as  we  move  right  from  S,  and  (4.19)  is  confirmed.  To 
verify  the  remaining  hypotheses  (2.20)  and  (2.21)  of  Proposition 
(2.18),  one  needs  little  more  than  the  picture  of  x  r  V*  given  in 
Figure  1,  and  we  shall  leave  this  as  an  exercise.  Then  (2.18)  gives 
us  V(x)  V*(x)  for  all  x  ^  0,  where  V*  is  the  value  function  for 
any  other  feasible  policy.  It  remains  only  to  show  that  this  same 
inequality  holds  for  x  <  0,  which  we  also  leave  as  an  exercise. 


5.  Interpretations 

In  this  section  we  seek  to  Interpret  the  conditions  (4.5)  and 
(4.6)  that  were  implosed  at  the  beginning  of  the  previous  section,  and 
to  elaborate  on  the  relationships  (4.12),  (4.14)  and  (4.15)  that  were 
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ultimately  found  to  determine  the  optimal  policy  parametera.  For  this 
purpose  we  fix  an  arbitrary  control  band  policy,  hereafter  called  the 
nominal  policy  or  candidate  policy,  with  parameters  0  <  q  <  Q  <  S. 

Let  Z  "  X+Y  be  the  associated  controlled  process,  let  V(x)  be  the 
associated  value  function,  and  set  x(x)  -  V'(x)  as  in  §4.  Using 
policy  Improvement  logic,  we  shall  derive  three  plausible  necessary 
conditions  for  optimality  of  the  nominal  policy. 

If  the  controller  starts  in  state  S,  immediately  jumps  downward 
to  level  x,  and  thereafter  follows  the  control  band  policy  (q,Q,S), 
his  total  expected  discounted  reward  will  be  <Kx-S)+V(x)  ■  V(x)-L+ 
(S-x)r.If  the  candidate  policy  is  to  be  optimal,  then  it  must  be  that 
this  expression  is  maximized  by  taking  x  ■  Q,  which  obviously  demands 

(5.1)  x(Q)  -  r  . 

In  exactly  the  same  way,  by  considering  the  various  points  x  to 
which  the  controller  could  jump  from  zero,  we  obtain  the  optimality 
condition 

(5.2)  it(q)  -  c  . 

To  complete  the  motivation  of  (4.5)-(4.6),  we  need  to  argue  that 
a  necessary  condition  for  optimality  is  n(Q)  ■  it(S).  One  can 
actually  obtain  a  much  more  stringent  and  enlightening  optimality 
condition  by  the  following  argument.  First  define 
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T(y)  =  inf {t  >  0  :  Z£  -  y)  ,  0  <  y  <  S 

6(x,y)  =  Ex[e"YT(y)]  ,  0  <  x,  y  <  S 

Suppose  that  our  controller,  following  the  candidate  control  band 
policy,  starts  in  state  x,  and  let  y  be  another  state  such  that 
0  <  y  <  x  <  S  and  y  <  Q.  The  expected  present  value  of  his  total 
net  reward  over  [0,<»)  is  of  course  V(x),  and  we  define 

U(x,y)  =  expected  present  value,  when  starting  in  state  x 
and  following  the  nominal  policy,  of  net  rewards 
earned  over  the  period  [0,  T(y)]. 

From  the  strong  Markov  property  of  X  and  the  stationary  character  of 
control  band  policies,  it  is  apparent  that  V(x)  -  U(x,y)+9(x,y)  V(y), 
so  we  have 


(5.3)  U(x,y)  -  V(x)  -  9(x,y)  V(y)  . 

Now  fix  x  and  y  satisfying  0  <  y  <  x  <  S  and  y  <  Q.  Let 
e  be  a  perturbation,  either  positive  or  negative,  small  enoguh  that 
0  <  y+e  <  Q.  Let  the  starting  state  be  xfe,  and  consider  the 
alternate  strategy  where  one  follows  a  control  band  policy  with 
parameters  (q,  Qfe,  S+e)  up  until  the  first  time  T*(y+e)  at  which 
level  y+e  is  hit,  and  then  reverts  to  usage  of  the  nominal  policy 


ever  afterward.  Let 


V*(x+c)  =  expected  present  value,  starting  in  state  x+e,  of 


net  rewards  earned  under  the  alternate  strategy 
over  (0,<a). 

From  the  spatial  homogeneity  of  Brownian  motion  we  obtain 

(5.4)  9(x,y)  =  Ex[e"YT(y)]  -  E[e"YT*(y+c) ]  , 

and  similarly 

(5.5)  U(x,y)  ■  expected  present  value,  when  starting  In  state 

xf e  and  following  the  alternate  strategy,  of  net 
rewards  earned  over  the  period  [0,  T*(y+e)]. 

Thus,  as  a  precise  analog  to  (5.3),  we  have 

(5.6)  V*(x+e)  -  U(x,y)  +  0(x,y)  V(y+e) 

-  V(x)  +  9(x,y)  [V(y+e)  -  V(y)]  . 

The  last  equality  is  obtained  by  substitution  of  (5.3).  Subtracting 
V(jrt-c)  from  (5.6),  we  see  that  the  improvement  effected  by  the 
alternate  strategy  is 

(5.7)  V*(x*e)  -  V(x+e)  -  0(x,y)  [V(y+e)  -  V(y)]  -  [V(x+e)  -  V(x)] 
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If  the  nominal  policy  is  to  be  optimal,  this  expression  must  have  a 
local  minimum  at  e  ■  0,  which  obviously  requiries  0  -  8(x,y)  V'(y)  - 
V’ (x) ,  or  equivalently  x(x)  ■  0(x,y)  it(y).  We  have  derived  this  for 
0  <  y  <  x  <  S  and  y  <  Q,  and  then  by  continuity  we  arrive  at  our 
final  optimality  condition 

(5.8)  x(x)  -  0(x,y)  x(y)  ,  for  0  <  y  <  x  <  S  and  y  <  Q  . 

Because  Z  jumps  immediately  to  Q  upon  hitting  S,  we  have  0(S,Q) 

■  so  (5.8)  implies 

(5.9)  *(Q)  -  it(S)  , 

and  this  completes  our  justification  for  the  conditions  (4.5)-(4.6) 
that  were  Imposed  earlier.  To  get  more  insight  from  (5.8),  set  y  *  Q 
and  invoke  the  condition  x(Q)  ■  r  derived  above.  This  gives 
x(x)  ■  r0(x,Q)  for  Q  <.  x  <  S,  and  then  the  basic  identity  (3.6) 
demands  that 

S 

(5.10)  L  -  r(S-Q)  -  (V(S)-V(Q) J  -  /  [r  -  *(x)Jdx 

Q 

S  8 

-  r  /  [l-0(x,Q)Jdx  -  r  /  E  {l-e”YT(Q) Jdx  . 

Q  0  x 

After  a  bit  of  reflection,  one  realizes  that  the  right-hand  side  of 

(5.10)  depends  only  on  ■  =  S-Q  and  that  it  Increases  continuously 
from  0  to  •  as  s  Increases  from  0  to  ».  Thus  (5.10)  uniquely 
determines  the  value  of  s  for  an  optimal  control  band  policy,  and  it 
is  just  the  probabilistic  articulation  of  the  analytical  condition 
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(4.12)  derived  earlier.  From  the  definitive  analytical  properties  of 
the  function  f8  introduced  in  §4 ,  one  can  easily  verify  the 
interpretation 

'.<«>  ■  rlWe'Tl<<>>1  > 

which  establishes  the  equivalence  of  (4.12)  and  (5.10).  To  determine 
the  policy  parameter  A  =  Q-q  from  (5.8),  set  x  -  Q  and  y  ■  q,  and 
use  the  fact  that  x(q)  ■  c  by  (5.2)  while  it(Q)  -  r  by  (5.1).  Then 
(5.8)  gives 

(5.11)  r  -  c0(Q,q)  -  cEQ[e~YT(q)]  . 

With  8  already  determined,  the  right-hand  side  of  (5.11)  depends 
only  on  A  =  Q-q,  and  it  decreases  continuously  from  c  to  0  as  A 
increases  from  0  to  •».  Thus  (5.11)  uniquely  determines  A  for  the 
optimal  control  band  policy,  and  one  can  easily  show  that  it  is 
equivalent  to  the  analytical  condition  (4.14)  derived  earlier.  Once 
s  and  A  have  been  set,  we  can  determine  q  from  (5.8)  as  follows. 
With  x  ■  q  and  0  £  y  £  q,  (5.8)  reduces  to  c  ■  9(q,y)  w(y) 
because  it(q)  “  c  by  (5.2).  Then  the  basic  identity  (4.19)  requires 
that 

(5.12)  K  -  /q  *(y)dy  -  cq  -  c  J*  ll/e(q,y)  -  l]dy 

0 

-  C  J*  (E  (1  -  e^^l/E  [e_rT(y)]}dy  , 

0  q  q 

which  is  the  probabilistic  articulation  of  our  analytical  condition 
(4.15). 
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